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Sequential Monte Carlo (SMC) is a methodology for sampling 
approximately from a sequence of probability distributions of increas- 
ing dimension and estimating their normalizing constants. We pro- 
pose here an alternative methodology named Sequentially Interact- 
ing Markov Chain Monte Carlo (SIMCMC). SIMCMC methods work 
by generating interacting non-Markovian sequences which behave 
asymptotically like independent Metropolis-Hastings (MH) Markov 
chains with the desired limiting distributions. Contrary to SMC, SIM- 
CMC allows us to iteratively improve our estimates in an MCMC-like 
fashion. We establish convergence results under realistic verifiable 
assumptions and demonstrate its performance on several examples 
arising in Bayesian time series analysis. 



1. Introduction. Let us consider a sequence of probability distributions 
{vrnlngT where T = {1, 2, . . . , P}, which we will refer to as "target" distribu- 
tions. We shall also refer to n as the time index. For ease of presentation, we 
shall assume here that 7r„((ix„) is defined on a measurable space {En,J'n) 
where Ei = E, T\= T and En = En-i x E, Tn = J^n-i x J^- We denote 
Xn = {xi, . . . ,Xn) where Xi £ E for i = 1, . . . ,n. Each 7r„((ix„) is assumed 
to admit a density 7r„(x.„) with respect to a u-finite dominating measure 
denoted dx„ and dxn = dxn-i x dxn- Additionally, we have 



where 7^ : En IR"*" is known pointwise and the normalizing constant Zn is 
unknown. 
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In a number of important applications, it is desirable to be able to sample 
from the sequence of distributions {vr„}„,gT and to estimate their normaliz- 
ing constants {Zn}neT', the most popular statistical application is the class 
of nonlinear non-Gaussian state-space models detailed in Section 4. In this 
context, TTn is the posterior distribution of the hidden state variables from 
time 1 to n given the observations from time 1 to n and Zn is the marginal 
likelihood of these observations. Many other applications, including contin- 
gency tables and population genetics, are discussed in [7, 12] and [21]. 

A standard approach to solve this class of problems relies on Sequential 
Monte Carlo (SMC) methods; see [12] and [21] for a review of the literature. 
In the SMC approach, the target distributions are approximated by a large 
number of random samples, termed particles, which are carried forward over 
time by using a combination of sequential importance sampling and resam- 
pling steps. These methods have become the tools of choice for sequential 
Bayesian inference but, even when there is no requirement for "real-time" in- 
ference, SMC algorithms are increasingly used as an alternative to MCMC; 
see, for example, [6, 9] and [21] for applications to econometrics models, 
finite mixture models and contingency tables. They also allow us to imple- 
ment goodness-of-fit tests easily in a time series context whereas a standard 
MCMC implementation is cumbersome [14]. Moreover, they provide an es- 
timate of the marginal likelihood of the data. 

The SMC methodology is now well established and many theoretical con- 
vergence results are available [7]. Nevertheless, in practice, it is typically im- 
possible to, a priori, determine the number of particles necessary to achieve 
a fixed precision for a given application. In such scenarios, users typically 
perform multiple runs for an increasing number of particles until stabiliza- 
tion of the Monte Carlo estimates is observed. Moreover, SMC algorithms 
are substantially different from MCMC algorithms and can appear difhcult 
to implement for nonspecialists. 

In this paper, we propose an alternative to SMC named Sequentially In- 
teracting Markov Chain Monte Carlo (SIMCMC). SIMCMC methods allow 
us to compute Monte Carlo estimates of the quantities of interest iteratively 
as they are, for instance, when using MCMC methods. This allows us to 
refine the Monte Carlo estimates until a suitably chosen stopping time. Fur- 
thermore, for people familiar with MCMC methods, SIMCMC methods are 
somewhat simpler than SMC methods to implement, because they only rely 
on MH steps. However, SIMCMC methods are not a class of MCMC meth- 
ods. These are non-Markovian algorithms which can be interpreted as an 
approximation of P "ideal" standard MCMC chains. It is based on the same 
key idea as SMC methods; that is as 7r„+i(x„) = / 

^n+l(^n+l) d^n~\~ 

1 is often 

"close" to 7r„(x„), it is sensible to use 7r„(x„) as part of a proposal distri- 
bution to sample 7r„+i(x„_|_i). In SMC methods, the correction between the 
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proposal distribution and the target distribution is performed using Impor- 
tance Sampling whereas in SIMCMC methods it is performed using an MH 
step. Such a strategy is computationally much more efficient than sampling 
separately from each target distribution using standard MCMC methods 
and also provides direct estimates of the normalizing constants {Z„}ngx- 

The potential real-time applications of SIMCMC methods are also worth 
commenting on. SMC methods have been used in various real-time engi- 
neering applications, for example, in neural decoding [4] and in target track- 
ing [15, 27]. In these problems, it is important to be able to compute func- 
tional of the posterior distributions of some quantity of interest, but it must 
also be done in real-time. SMC methods work with collections of particles 
that are updated sequentially to reflect these distributions. Clearly, in such 
real-time problems it is important that the collections of particles are not 
too large, or else the computational burden can cause the SMC algorithm 
to fall behind the system being analyzed. SIMCMC methods provide a very 
convenient way to make optimal use of what computing power is available. 
Since SIMCMC works by adding one particle at a time to collections rep- 
resenting distributions, we can simply run it continually in between arrival 
of successive observations, and it will accrue as many particles as it can in 
whatever amount of time is taken. 

Related ideas where we also have a sequence of nested MCMC-like chains 
"feeding" each other and targeting a sequence of increasingly complex dis- 
tributions have recently appeared in statistics [19] and physics [23]. In the 
equi-energy sampler [19], the authors consider a sequence of distributions 
indexed by a temperature and an energy truncation whereas in [23] the au- 
thors consider a sequence of coarse-grained distributions. It is also possible 
to think of SIMCMC methods and the algorithms in [19] and [23] as nonstan- 
dard adaptive MCMC schemes [2, 26] where the parameters to be adapted 
are probability distributions instead of finite-dimensional parameters. Our 
convergence results rely partly on ideas developed in this field [2]. 

The rest of the paper is organized as follows. In Section 2, we describe 
SIMCMC methods, give some guidelines for the design of efficient algo- 
rithms and discuss implementation issues. In Section 3, we present some 
convergence results. In Section 4, we demonstrate the performance of this 
algorithm for various Bayesian time series problems and compare it to SMC. 
Finally, we discuss a number of further potential extensions in Section 5. The 
proofs of the results in Section 3 can be found in Appendix. 

2. Sequentially interacting Markov chain Monte Carlo. 

2.1. The SIMCMC algorithm. Let i be the iteration counter. The SIM- 
CMC algorithm constructs P sequences {Xf )},{X^*^ },..., {Xj^}. In Sec- 
tion 3, we establish weak necessary conditions ensuring that as i approaches 
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(i) 

infinity, the distribution of X„ approaches tt^; we wih assume here that 
these conditions are satisfied to explain the rationale behind our algorithm. 
To specify the algorithm, we require a sequence of P proposal distributions, 
specified by their densities 

gi(xi),g2(xi,X2),. . . ,gp(xp_i,xp). 

Each Qn : En-i x E R+ = 0) is a probability density in its last ar- 

gument Xn with respect to dxn, which may depend (for n = 2, . . . , P) on the 
first argument. Proposals are drawn from qi{-) for updates of the sequence 

{x[*^}, from q2{-) for updates of the sequence {X2 and so on. (Selection of 
proposal distributions is discussed in Section 2.2.) Based on these proposals, 
we define the weights 

/ N 7i(xi) 

(2.1) 

/ \ 7n(Xn,) „ D 

7„_1 (x„,_i )g„(x„„i , Xn) 

For any measure on (£'„_i, J>i_i), we define 
and 

(2.2) Sn = {x„ G £'„:7r„(x„) > 0}. 

We also denote by 7r„ the empirical measure approximation of the target 
distribution 7r„ given by 

1 * 

(2.3) ^^:\d^n) = --T E "^x^"" ('^^«)- 

m=0 

Intuitively, the SIMCMC algorithm proceeds as follows. At each iteration 
i of the algorithm, the algorithm samples for n € T by first sampling 
xP, then X^'^ and so on. For n = 1, {Xf^} is a standard Markov chain gen- 
erated using an independent MH sampler of invariant distribution 7ri(xi) 
and proposal distribution gi(xi). For n = 2, we would like to approximate 
an independent MH sampler of invariant distribution 7r2(x2) and proposal 
distribution (vri xg2)(x2). As it is impossible to sample from tti exactly, we 
replace tti at iteration i by its current empirical measure approximation vrj . 
Similarly for n > 2, we approximate an MH sampler of invariant distribution 
7r„(x„) and proposal distribution (7r„_i X(7„)(x.„) by replacing 7r„_i at itera- 
tion i by its current empirical measure approximation tt^-i- The sequences 
1X2^}, . . . , {Xp^} generated this way are clearly non-Markovian. 
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Sequentially interacting Markov chain Monte Carlo 

• Initialization, i = 

• For n € T, set randomly X.n^ = xlf-* G 5„. 

• For iteration i>l 

• For 71= 1 

• Sample X^*-*-* ~ qi{-). 

• With probability 

(2.4) a,(Xr),xf)) = lA^^^l(f^ 

set XP = X.f\ otherwise set X^^^ = X^'~'\ 

• For n = 2, . . . , P 

• Sample X^,^*'' ~ (vr^l^ x ?„)(•)■ 

• With probability 

(2.5) an(X^ ^X„^^) = 1A 

set Xi^^ = X*r!-'\ otherwise set X^^^ = xi^~^\ 



The (ratio of) normahzing constants can easily be estimated by 



m 

(2.6) 



(^) 

^ ' m=l 



Equation (2.6) follows from the identity 



Zfi' 



){-Kn-l X gn)('^X„) 



and the fact that asymptotically (as z — ?> cxd) XJ^^*^ is distributed according 
to (7r„_i xg„)(x„) under weak conditions given in Section 3. 

2.2. Algorithm settings. In a similar manner to SMC methods, the per- 
formance of the SIMCMC algorithm depends on the proposal distributions. 
However, it is possible to devise some useful guidelines for this sequence 
of (pseudo-) independent samplers, using reasoning similar to that adopted 

in the SMC framework. Asymptotically, Xji*-*^ is distributed according to 
(7r„_i xg„)(x„) and w„(x„) is just the importance weight (up to a normaliz- 
ing constant) between 7r„(x„) and (7r„_i X(7„)(x„). The proposal distribution 
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minimizing the variance of this importance weight is simply given by 

(2.7) g°P*(x„_i,Xn) =7f„(x„_i,x„), 

where 7f„(x„_i,x„) is the conditional density of x„ given x„_i under 7r„, 
that is, 

(2.8) 7r„(x„_i,x„ 



7rn(x„,_i) 

In the SMC literature, 7f„(x„_i, a;„) is called the "optimal" importance den- 
sity [11]. This yields 

(2.9) u;°P*(x„) a7r„/„_i(x„„i), 

where 



(2.10) 7r„/„_i(x„_i) 
with 



7rri.(Xn-l) 
7rn-l(x„_i) 



7r„(x„„i)= / 7r„,(x„)(ix„. 
Je 



In this case, as tt;^''*(x„) is independent of x„, the algorithm described above 
can be further simplified. It is indeed possible to decide whether to accept or 
reject a candidate even before sampling it. This is more computationally effi- 
cient because if the move is to be rejected there is no need to sample the can- 
didate. In most applications, it will be difficult to sample from (2.7) and/or 
to compute (2.9) as it involves computing 7r„(x„_i) up to a normalizing 
constant. In this case, we recommend approximating (2.7). Similar strate- 
gies have been developed successfully in the SMC framework [5, 11, 22, 25]. 
The advantages of such sampling strategies in the SIMCMC case will be 
demonstrated in the simulation section. 

Generally speaking, most of the methodology developed in the SMC set- 
ting can be directly reapplied here. This includes the use of Rao-Blackwellisa- 
tion techniques to reduce the dimensionality of the target distributions [11, 
22] or of auxiliary particle-type ideas where we build target distributions 
biased toward "promising" regions of the space [5, 25]. 

2.3. Implementation issues. 

2.3.1. Burn-in and storage requirements. We have presented the algo- 
rithm without any burn-in. This can be easily included if necessary by con- 
sidering at iteration i of the algorithm 



^ ' ' m=l{i,B) 
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where 

l{i,B) = 0\/ {{i- B) /\B), 

where B is an appropriate number of initial samples to be discarded as 
burn-in. Note that when i > 2B, we have B) = B. 

Note that in its original form, the SIMCMC algorithm requires storing 
the sequences {X^i'^}„gT- This could be expensive if the number of target 
distributions P and/or the number of iterations of the SIMCMC are large. 
However, in many scenarios of interest, including nonlinear non-Gaussian 
state-space models or the scenarios considered in [9], it is possible to dras- 
tically reduce these storage requirements as we are only interested in es- 
timating the marginals {7r„(xn)} and we have t(;„(x„) = Wn{xn-i,Xn) and 

(7„(x„_i, Xn) = qn{xn-i,Xn)- In such cases, we only need to store {Xn^}n£T, 
resulting in significant memory savings. 

2.3.2. Combining sampling strategies. In practice, it is possible to com- 
bine the SIMCMC strategy with SMC methods; that is we can generate 
say N (approximate) samples from {vrnj^gf using SMC then use the SIM- 
CMC strategy to increase the number of particles until the Monte Carlo 
estimates stabilize. We emphasize that SIMCMC will be primarily useful in 
the context where we do not have a predetermined computational budget. 
Indeed, if the computational budget is fixed, then better estimates could 
be obtained by switching the iteration i and time n loops in the SIMCMC 
algorithm. 

2.4. Discussion and extensions. Standard MCMC methods do not ad- 
dress the problem solved by SIMCMC methods. Trans-dimensional MCMC 
methods [17] allow us to sample from a sequence of "related" target dis- 
tributions of different dimensions but require the knowledge of the ratio of 
normalizing constants between these target distributions. Simulated tem- 
pering and parallel tempering require all target distributions to be defined 
on the same space and rely on MCMC kernels to explore each target dis- 
tribution; see [20] for a recent discussion of such techniques. As mentioned 
earlier in the Introduction, ideas related to SIMCMC where a sequence of 
"ideal" MCMC algorithms is approximated have recently appeared in statis- 
tics [19] and physics [23]. However, contrary to these algorithms, the target 
distributions considered here are of increasing dimension and the proposed 
interacting mechanism is simpler. Whereas the equi-energy sampler [19] al- 
lows "swap" moves between chains, we only use the samples of the sequence 
{X^*^} to feed {X^*^^} but {X^*^-,^} is never used to generate {X^*^}. 

There are many possible extensions of the SIMCMC algorithm. In this 
respect, the SIMCMC algorithm is somehow a proof-of-concept algorithm 
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demonstrating that it is possible to make sequences targeting different dis- 
tributions interact without the need to define a target distribution on an 
extended state space. For example, a simple modification of the SIMCMC 
algorithm can be easily parallelized. Instead of sampling our candidate 
at iteration i according to {t^^-i ^ Qn){^) we can sample it instead from 

i^n-i^ ^ 9n)(')- this allows us to simulate the sequences {X^*^} on P par- 
allel processors. It is straightforward to adapt the convergence results given 
in Section 3 to this parallel version of SIMCMC. 

In the context of real-time applications where 7r„(x„) is typically the 
posterior distribution p(x„|yi:„) of some states x„ given the observations 
yi:n, SIMCMC methods can also be very useful. Indeed, SMC methods can- 
not easily address situations where the observations arrive at random times 
whereas SIMCMC methods allow us to make optimal use of what computing 
power is available by adding as many particles as possible until the arrival of 
a new observation. In such cases, a standard implementation would consist 
of updating our approximation of 7r„(x„) at "time" n by adding iteratively 
particles to the approximations TTn~L+ii^n-L+i), ■ ■ ■ ,7r„-i(x„_i),7r„(x„) for 
a lag L>1 until the arrival of data Un+i- For L = 1, such schemes have been 
recently proposed independently in [27]. 

3. Convergence results. We now present some convergence results for 
SIMCMC. Despite the non-Markovian nature of this algorithm, we are here 
able to provide realistic verifiable assumptions ensuring the asymptotic con- 
sistency of the SIMCMC estimates. Our technique of proof rely on Poisson's 
equation [16]; similar tools have been used in [2] to study the convergence of 
adaptive MCMC schemes and in [10] to study the stability of self-interacting 
Markov chains. 

Let us introduce B{En) = {/« -.En^^ such that < 1} where ||/n|| = 
supx^g^^ |/n(x„)|. We denote by E^(o) [•] the expectation with respect to the 

distribution of the simulated sequences initialized at x^'^^^ := (x^'^^Xg'^^ . . . , 
x.^^"*) and No = N U {0}. For any measure /i and test function /, we write 

^i{f) = ^^l{dx)f{x). 

Our proofs rely on the following assumption. 

Assumption A1. For any n € T, there exists Bn < oo such that for any 

Xn Sn 

(3.1) )<Bn. 

This assumption is quite weak and can be easily checked in all the ex- 
amples presented in Section 4. Note that a similar assumption also appears 
when Lp bounds are established for SMC methods [7]. 
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Our first result establishes the convergence of the empirical averages to- 
ward the correct expectations at the standard Monte Carlo rate. 

Theorem 3.1. Given Assumption Al, for any n G T and any p>l, 
there exist Ci^„, C2,p < oo such that for any x^'^^^ G 5i x • • • x Sn, fn € B{En) 
and i G No 

Using (2.6), a straightforward corollary of Theorem 3.1 is the following 
result. 

Theorem 3.2. Given Assumption Al, for any n G T and any p > 1, 
there exist Ci^„, C2,p < oo such that for any x^'^^„ G 5i x • ■ • x Sn, fn £ B{En) 
and i G No 

and for ?i G T \ {1} 



E 



Zn-lJ 



n-1 



- ■ 



By the Borel-Cantelli lemma, Theorems 3.1 and 3.2 also ensure almost 
sure convergence of the empirical averages and of the normalizing constant 
estimates. 

Our final result establishes that each sequence {X^i*"*} converges toward 7r„. 

Theorem 3.3. Given Assumption Al, for any n G T, x^*^^^ G 5i x • • • x 
Sn and fn G B{En) we have 

limE (0) [/„(X«)]=7r„(/„). 

4. Applications. In this section, we will focus on the applications of 
SIMCMC to nonlinear non-Gaussian state-space models. Consider an un- 
observed valued Markov process {Xn}n&'f satisfying 

Xn\{Xn-l=x}^f{x,-). 

We assume that we have access to observations {l^jneT which, conditionally 
on {X„}, are independent and distributed according to 

(4.1) Yn\{Xn = x}r^g{x,-). 

This family of models is important, because almost every stationary time 
series model appearing in the literature can be cast into this form. Given 
yi ;P, we are often interested in computing the sequence of posterior distri- 
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butions {p{xn\yi:n)}neT to perform goodness-of-fit and/or to compute the 
marginal likelihood p{yi:p)- By defining the unnormalized distribution as 

n 

(4.2) 7„(x.„) = p(x„, yi:n) = fi{xi)g{xi,yi) JJ f{xk-i,Xk)g{xk,yk) 

k=2 

(which is typically known pointwise), we have 7r„(x„) = p(x„|yi ; „) and Zn = 
vivi-.n) so that SIMCMC can be applied. 

We will consider here three examples where the SIMCMC algorithms 
are compared to their SMC counterparts. For a fixed number of itera- 
tions/particles, SMC and SIMCMC have approximately the same computa- 
tional complexity. The same proposals and the same number of samples were 
thus used to allow for a fair comparison. Note that we chose not to use any 
burn-in period for the SIMCMC and we initialize the algorithm by picking 
yih' = (x^'^lj^, x^f^) for any n where Xp^ is a sample from the prior. The SMC 
algorithms were implemented using a stratified resampling procedure [18]. 
The first two examples compare SMC to SIMCMC in terms of log-likelihood 
estimates. The third example demonstrates the use of SIMCMC in a real- 
time tracking application. 

4.1. Linear Gaussian model. We consider a linear Gaussian model where 

Xn = AXn-l + (TyVn, 

(4.3) 

Yn = Xn + (JwWn, 

with Xi ~AA(0,A), k''~'AA(0,A), VF„ Ar(0, A), A = diag(l, . . . , 1) and 
^ is a random (doubly stochastic) matrix. Here M{^, S) is a Gaussian distri- 
bution of mean /x and variance-covariance matrix S. For this model we can 
compute the marginal likelihood Zp = p{yi:p) exactly using the Kalman 
filter. This allows us to compare our results to the ground truth. 

We use two proposal densities: the prior density /(x„_i,x„) and the op- 
timal density (4.3) given by q^^{yin-i,Xn) oc f{xn-i,Xn)g{xn,yn) which is a 
Gaussian. In both cases, it is easy to check that Assumption Al is satisfied. 

For d = 2,5, 10, we simulated a realization of P = 100 observations for 
ay =2 and cr^ = 0.5. In Tables 1 and 2, we present the performance of both 
SIMCMC and SMC for a varying d, a varying number of samples and the 
two proposal distributions in terms on Root Mean Square Error (RMSE) of 
the log-likelihood estimate. 

As expected, the performance of our estimates is very significantly im- 
proved when the optimal distribution is used as the observations are quite 
informative. Unsurprisingly, SMC outperform SIMCMC for a fixed compu- 
tational complexity. 
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Table 1 

RMSE for SMC and SIMCMC algorithms over 100 realizations for prior proposal 



N 1000 2500 5000 10,000 25,000 



SMC, d= 2 1.66 0.98 0.63 0.52 0.29 

SIMCMC, d = 2 1.57 0.97 0.75 0.59 0.41 

SMC, d = 5 4.84 4.76 3.06 2.18 1.59 

SIMCMC, d = 5 5.57 5.41 4.12 2.36 1.83 

SMC, d=10 16.91 14.57 11.14 10.61 8.91 

SIMCMC, d = 10 18.22 16.78 14.56 12.46 11.25 



4.2. A nonlinear non-Gaussian state-space model. We now consider a 
nonlinear non-Gaussian state-space model introduced in [18] which has been 
used in many SMC publications: 

Xn = — h — — h 8cos(1.2n) + Vn, 

v2 

i^n = — + Wn, 

20 

where Xi ~ AA(0,5), K ''^ ' A/'(0, a^) and W„ ''^ ' A/'(0, ct^ ). As the sign of 
the state Xn is not observed, the posterior distributions {p{xi n\yi ■. n)} are 
multimodal. SMC approximations are able to capture properly the multi- 
modality of the posteriors. This allows us to assess here whether SIMCMC 
can also explore properly these multimodal distributions by comparing SIM- 
CMC estimates to SMC estimates. 

We use as a proposal density the prior density In this case, 

it is easy to check that Assumption Al is satisfied. 

In Table 3, we present the performance of both SIMCMC and SMC for 
a varying number of samples and a varying cr^ whereas we set cr^ = 5. Both 
SMC and SIMCMC are performing better as the signal to noise ratio de- 
grades. This should not come as a surprise. As we are using the prior as a 

Table 2 

RMSE for SMC and SIMCMC algorithms over 100 realizations for optimal proposal 



N 


1000 


2500 


5000 


10,000 


25,000 


SMC, d=2 


0.33 


0.17 


0.09 


0.06 


0.04 


SIMCMC, d = 2 


0.37 


0.19 


0.14 


0.11 


0.06 


SMC, d=5 


0.28 


0.16 


0.10 


0.07 


0.06 


SIMCMC, d = 5 


0.29 


0.23 


0.15 


0.12 


0.07 


SMC, d=10 


0.18 


0.14 


0.09 


0.05 


0.07 


SIMCMC, d=10 


0.31 


0.20 


0.16 


0.12 


0.10 
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Table 3 

Average RMSE of log-likelihood estimates for SMC and SIM CMC algorithms over 100 

realizations 



N 




2500 


5000 


10,000 


25,000 


50,000 


SMC, gI = 1 




0.80 


0.55 


0.40 


0.24 


0.17 


SIMCMC, al 


= 1 


0.95 


0.60 


0.75 


0.59 


0.41 


SMC, al = 2 




0.33 


0.23 


0.17 


0.11 


0.07 


SIMCMC, al 


= 2 


0.91 


0.70 


0.50 


0.38 


0.29 


SMC, al^5 




0.13 


0.10 


0.08 


0.05 


0.03 


SIMCMC, al 


= 5 


0.28 


0.21 


0.19 


0.12 


0.08 



proposal, it is preferable to have a diffuse likelihood for good performance. 
Experimentally we observed that SIMCMC and SMC estimates coincide for 
large N. However for a fixed computational complexity, SIMCMC is outper- 
formed by SMC in terms of RMSE. 



4.3. Target tracking. We consider here a target tracking problem [15, 21]. 
The target is modeled using a standard constant velocity model 

/I T 0' 



(4.4) 



where K 



Xn 







Vo 



'•~-AA(0,S), with T: 



/rV3 



V 




+ Vn 



The state vector Xn = {X^, X^, X^, 






TVs 

X^)^ is such that X^ (resp., X^) 




cor- 



responds to the horizontal (resp., vertical) position of the target whereas 
(resp., Xn) corresponds to the horizontal (resp., vertical) velocity. In 
many real tracking applications, the observations are collected at random 
times [13]. We have the following measurement equation: 



(4.5) 



Yn = tan 



where Wn ' A/'(0, 10~^); these parameters are representative of real- world 
tracking scenarios. We assume that we only observe Yn at the time indexes 
n = 4:k where € N and, when n 7^ 4/c, we observe Yn with probability 
p = 0.25. We are here interested in estimating the associated sequence of 
posterior distributions on-line. 
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Table 4 

Average RMSE of the Monte Carlo state estimate 



Algorithm 


SMC with N 


SMC with N' 


SIMCMC 


Average RMSE 


2.14 


3.21 


1.62 



Assume the computational complexity of the SMC method is such that 
only N = 1000 particles can be used in one time unit. In such scenarios, we 
can either use SMC with A'^ particles to estimate the sequence of posterior 
distributions of interest or SMC with say N' = 4000 particles and chose to 
ignore observations that might appear when Ak. These are two standard 
approaches used in applications. Alternatively, we can use the SIMCMC 
method to select adaptively the number of particles as discussed in Sec- 
tion 2.4. If SIMCMC algorithm only adds particles to the approximation 
of the current posterior at time n, it will use approximately mN particles 
to approximate this posterior if the next observation only appears at time 
n + m. 

We simulated 50 realizations of P = 100 observations according to the 
model (4.4) and (4.5) and use as a proposal density the prior density f{xn-i, 
Xn)- This ensures that Assumption Al is satisfied. In Table 4, we display the 
performance of SMC with N particles, N' particles (ignoring some observa- 
tions) and SIMCMC using an adaptive number of particles in terms of the 
average RMSE of the estimate of the conditional expectation E(X„|yi:„). 
In such scenarios, SIMCMC methods clearly outperforms SMC methods. 

5. Discussion. We have described an iterative algorithm based on inter- 
acting non-Markovian sequences which is an alternative to SMC and have 
established convergence results validating this methodology. The algorithm 
is straightforward to implement for people already familiar with MCMC. 
The main strength of SIMCMC compared to SMC is that it allows us to 
iteratively improve our estimates in an MCMC-like fashion until a suitable 
stopping criterion is met. This is useful as in numerous applications the 
number of particles required to ensure the estimates are reasonably precise 
is unknown. It is also useful in real-time applications when one is unsure of 
exactly how much time will be available between successive arrivals of data 
points. 

As discussed in Section 2.4, numerous variants of SIMCMC can be eas- 
ily developed which have no SMC equivalent. The fact that such schemes 
do not need to define a target distribution on an extended state-space ad- 
mitting {nn}n&T as marginals offers a lot of flexibility. For example, if we 
have access to multiple processors, it is possible to sample from each 7r„ 
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independently using standard MCMC and perform several interactions si- 
multaneously. Adaptive versions of the algorithms can also be proposed by 
monitoring the acceptance ratio of the MH steps. If the acceptance probabil- 
ity of the MH move between say 7r„_i and 7r„ is low, we could, for example, 
increase the number of proposals at this time index. 

From a theoretical point of view, there are a number of interesting ques- 
tions to explore. Under additional stability assumptions on the Feynman- 
Kac semigroup induced by {7r„}„gT and {gnjnGT [7], Chapter 4, we have 
recently established in [8] convergence results ensuring that, for functions of 
the form /n(x„) = fn{xn)-, the bound Ci^n in Theorem 3.1 is independent of 
n. A central limit theorem has also been established in [3]. 

APPENDIX: PROOFS OF CONVERGENCE 

Our proofs rely on the Poisson equation [16] and are inspired by ideas de- 
veloped in [1, 2, 10]. However, contrary to standard adaptive MCMC schemes 
[2, 26], the Markov kernels we consider do not necessarily admit the target 
distribution as invariant distribution; see [10] for similar scenarios. However, 
in our context, it is still possible to establish stronger results than in [10] as 
we can characterize exactly these invariant distributions; see Proposition 1. 

A.l. Notation. We denote by V{En) the set of probability measures on 
{En,J~n)- We introduce the independent Metropolis-Hastings (MH) kernel 
Ki:EiX 1] defined by 

Ki (xi , dx^ ) = ai (xi , x'l ) qi {dx[ ) 

(A.l) 

For n E {2, . . . , P}, we associate with any fin-i S 'P(-E'n-i) the Markov kernel 
Kn,^i„-i -EnXTn^ [0, 1] defined by 

-f^n,At„-i(Xn,dx'„) = a„ (x.„ , x^) X g„)((ix'„) 

(A.2) 

+ ^1 - y a„(x.„,y„)(^f„„i X qn){dyn)^^x„{dx'J, 

where x^ = (xj^_^,x^). In (A.l) and (A.2), we have for n G T 

Wn{y^'n) 



an(x„,x^) = 1 A 



We use II • lltv to denote the total variation norm and for any Markov 
kernel 



K\x,dx') := j K'~\-K,dy)K{y,d^). 
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A. 2. Preliminary results. We establish here the expression of the in- 
variant distributions of the kernels i^i(xi,dx'^) and (x„, dx^) and 
establish that these kernels are geometrically ergodic. We also provide some 
perturbation bounds for Er„^^^_-^ (x„,, dx^) and its invariant distribution. 

Lemma 1. Given Assumption Al, (xi , (ix'^ ) is uniformly geometri- 
cally ergodic of invariant distribution 7ri((ixi). 

By construction, i^i (xi , dx'^ ) is an MH algorithm of invariant distribution 
7ri(dxi). Uniform ergodicity follows from Assumption Al; see, for example. 
Theorem 2.1. in [24]. 

Proposition 1. Given Assumption Al, for any n G {2, . . . , P} and any 
fJ-n-i €'P(£'n-i), -f^n,^„_i (xn, dx^) is uniformly geometrically ergodic of in- 
variant distribution 

„^ . w, ^ 7r„/„_i(x,,_i) • X 7fn)((ix„) 
(A.3) a;„,(/i„,_i)(dx„) = , 

fJ-n-l[T^n/n-l) 

where 7f„(x„_i, dxn) and ■Kn/n-ii'^n-i) are defined, respectively, in (2.8) 
and (2.10). 

Proof. To establish the result, it is sufficient to rewrite 

)/7rn-l(Xn-l)/U„_l(x„_i) 

Zn-i X g„)(x„) 

Zn ■^n/n— l(Xn— 1 

Zn^l X g„)(x„) 

This shows that (xn, c?x^) is nothing but a standard MH algorithm of 

proposal distribution (nn-i x g„)(x„) and target distribution given by (A.3). 
This distribution is always proper because Assumption Al implies that 
vr„/„_i(x„_i) < oo over En-i- Uniform ergodicity follows from Theorem 2.1. 
in [24]. □ 

Corollary. It follows that for any n € {2, . . . ,P} there exists pn < 1 
such that for any m € Nq and x„ G £"„, 

(A.4) ||K-^_^(x„,.)-a;„,(/.„_i)(.)||tv<P^. 

Proposition 2. Given Assumption Al, for any n € {2, . . . , P}, we have 
for any Pn-i, i^n-i G 'P{En-i), x^ € En and m € N 

(A.5) ||K™^_^(X„,-) -^^,„_,(x„,-)||tv < -^ll/Un-l -i^n-llltv 

and 

2 

(A. 6) - a;„,(i/„_i) |[tv < ||Ain.-i -i^n-i||tv 

1 - Pn 
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Proof. For any /„ € B{En), we have the following decomposition: 

^;::M„-,(/«)(xn)-K™.„_,(/n)(xn) 

m—l 

~ -^n,/^„_i (-^n,Mn-i ~ ^n,Un-i)Kn,uJ^i (/n)(Xn) 

j=0 

j=0 

From Assumption Al, we know that for any Un-i £ T^{En-i) 
and from (A. 2) for any x„ € £"„, and /„ € B[En) 



n—1 l^n—l) ^ Qn 
+ U 



thus 

\\Kn <2||/i n—1 ^n— l||tv 

So 



m— 1 

i=o 

r) ^ r n II II 

— ^"j — '^n-l tv 

1 - Pn 

Hence, (A. 5) follows and we obtain (A. 6) by taking the limit as m oo. □ 

A.3. Convergence of averages. For any n € {2, . . . , P}, p > 1 and /„ € 

B[En) we want to study 

V) [|^i*^(/n)-vr„(/„)n'/^ 

^1 ; n 

We have 

(A.7) - 7r„(/„) = - + 5«(/„) - T^nUn), 

where 



m=0 
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To study the first term on the right-hand side of (A. 7), we introduce the 
Poisson equation [16] 

fn{x) - a;„(/i)(/„) = fn,f,{x) - Kn,f,{fn,t,){x), 

whose solution, if Kn,^ is geometrically ergodic, is given by 

(A.8) lA^) = ^ [KJU){x) - uMifn)]. 

ieNo 

We have 

(i + l)[?f«(/„)-5»] 

i 

(A.9) = Mt'\U) + ^ [4 ^(.+.) (X(™+1)) - I (X(™+1))] 

■m=0 

where 

(A.IO) m(^)(/„) = ^ [4 (X(™+1)) - (4 )(x(r))] 

m=0 

is a ^^-martingale with = (t(X^^' *\ Xg^' *\ . . . , Xn ' where we define 

xi^-) = (xl^...,x«). 

We remind the reader that B{En) = {fn-En K such that < 1} 
where ||/n|| = sup^^g^^ |/„(x„)|. We establish the following propositions. 



(0) 



Proposition 3. Given Assumption Al, for any n G {2, . . . ,P}, x]^. 
P ^ 1; /n € B[En) and m € Nq, we have 

E^(o) [i4 („,(x(r+^))n^/^<-^. 



Proof. Assumption Al ensures that f^^(m) is given by an expression 
of the form (A.8). We have 



<5;e^(o, [e^(o) (iK;.(„,(/„)(x(r+^))-..„(??S)(/„)nc-i)] 



< 

ieNo 



18 



A. BROCKWELL, P. DEL MORAL AND A. DOUCET 



using Minkowski's inequality and the fact that K is an uniformly er- 

godic Markov kernel conditional upon Q^-i using Assumption Al. □ 

Proposition 4. Given Assumption Al, for any n € {2, . . . , P} and any 
p > 1, there exist Bi^n,B2.p < oo such that for any x^'^^„, /„ G B{En) and 
m G N 

E mirhfnm^'" < i?i,„P2,pmV2. 



Proof. For p > 1 we use Burkholder's inequality (e.g., [28], page 499); 
that is, there exist constants Ci^n,C2,p < oo such that 

E „,) [\M(rHfn)ff^ 



(A.ll) < C7i,„C2,pE 



(0) 



m—l 



i=0 



p/2- 



For p € (1,2), we can bound the left-hand side of (A.ll) 



i/p 



E 



(0) 



^ m—l 



p/2- 



. 1=0 



l/p 



< E 



(0) 
1 : Ti 



m—l 



p/2- 



< E 



(0) 
1 : n 



1=0 
m—l 



'"n-l "'"n-l 



l/p 



1/2 



2 E[i4,.(^(xr^^)i' + i^.,,(^(4,,(^)(x«) 

using (a — 6)^ < 2(a^ + 6^) and Jensen's inequality. Now using Jensen's in- 
equality again, we have 

V [I^n5f« (45f« )(X»)|2]<E ,0) [J^„^« (|4^„ |2)(X, 

and using Proposition 3, we obtain the bound 



E 



(0) 
1 ; n 



/ m—l 



p/2- 



, i=0 



l/p 



< 



I- Pn 



-m 



1/2 



SEQUENTIALLY INTERACTING MCMC 19 

For p > 2, we we can bound the left-hand side of (A. 11) through Minkows- 
ki's inequahty 

/■m-1 



^l:n "'"n-l 
, 1 = 



\ 1/2 

Using Minkowski's inequahty again 

E^(o) [i4^« (x(:+i))-i^„^(.) (4^(.) )(x»)n 



Now from Proposition 3 and Jensen's inequahty, we can conclude for p>l. 
For p= 1, we use Davis' inequality (e.g., [28], page 499) to obtain the result 
using similar arguments which are not repeated here. □ 



Proposition 5. Given Assumption Al, for any ra G {2, . . . , P} and p > 

•,d 1 

B„ 



1 there exists Bn < oo such that for any x^'^^^, G B{En) and m G Nq 



E.(o) [1/ ^(..+i)(x(r+^)) - / (x(r+i))n'/^ < 



m + 2 



Proof. Our proof is based on the following key decomposition estab- 
lished in Lemma 3.4. in [10]: 

4 „(,„+i, (X(r+i)) - 4 (X(r+i)) + a;„(5fl"!t'^)(4 ^(-) ) 

(A.12) = ^ (<5x(n.+i) -o;,(7fi'!t'^))i^;.(„+,)(K^^(„.+i) -i^^^(™)) 

■■^ ' "if„_l "-'^n-l "'''n-1 

xi^^(„) (/n-a;.(7fS)(/.)). 

We have 

\{S (m+l) - UJn{Trl^t^^))K' {K - (,n) ) 

"'"^n-l "'''n-1 "'''n-l 

xi^^(„) (/n-o;„(^a)(/„))| 

"''^n-l 

= l('^x('"+^) ~ ^"(^i-t^^))^' -(^+1) 5f(™+i) - ^„ ~(.n) (/n.)| 

^n. "''^n-1 "■''^n-1 "''^n-1 "'^n-l 
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(A.13) <fi\\{6 i^+l) - W„(?fi,'^t^^))jr ^i,^ + l){K (m + l) - K Mm) 



^7 - ||^(m+l) ^(m.) II 11 /J- f^\'/'^^^)\\ T^i 



(m+l)^ 



1 - Pn 



"i+l) lltv 

"n-1 



^7 - ||^(m+l) ^(m) II ills: z^im-t- 

^ Pn " 

n ||^(m+l) ^(m) n 
— 1 _ ^ lFn-1 ~'^n-llltv' 

using Assumption Al, (A. 5) in Proposition 2 and Assumption Al again. 
Now we have 

K(7fi™t'^)(/^(™))| 



(m+l)^ 



(A.14) 



(m.+l) , 



a;n(?fS))i^:,(.)(/n)| 



(m+l)^ 



^ i II /^(m4 

ieNo 



■ w„ vr, 



nV'n-iyiltv 



, 2 (jjj+l) ^(m) II 



using Assumption Al and (A. 6) in Proposition 2. 
Now for any fn~i € B{En~i), we have 



^(m+l) / J, X ^yrii) / j. x 



(m) 



m + 2 



m + 2 



thus 
(A.15) 



The resuh fohows now directly combining (A. 12), (A.13), (A.14), (A.15) and 
using Minkowski's inequahty. □ 



Proposition 6. Given Assumption Al, for any n S {2, . . . , P} and any 
p>l there exists Bi^n, B2^p < oo such that for x^*^^^, /„ € B{En) and i G Nq 



E 



(0) 



[i-«(/n)-5»(/.)n'/^<^x#f. 
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Proof. Using (A. 9) and Minkowski's inequality, we obtain 

^1 : n 

[I -\- I) -^1 ; n 

(A.16) + — — y E [1/ (x(r+'^) - / ^(.n) (x(r+'^)r]'/^ 



m=0 



i + 1 

The first term on the right-hand side of (A.16) is bounded using Propo- 
sition 4, the term on the last line of the right-hand side are going to zero 
because of Proposition 3. For the second term, we obtain using Proposition 5 

y E [1/ „(,.+i,(x(-+i)) - / (x(™+i))r]^/p < y 

rn=0 m=0 

<B„log(i + 2). 

The result follows. □ 

Proof of Theorem 3.1. Under Assumption Al, the result is clearly 
true for n = 1 thanks to Lemma 1. Assume it is true for n — 1 and let us 
prove it for n. We have, using Minkowski's inequality, 

+ E^(o, [|5«(/n)-vr„(/„)ni/^ 

The first term on the right-hand side can be bounded using Proposition 6. 
For the second term, we have 



E.(0) [\S^\fn)-Mfnrf' 



Using (A. 3), we obtain 

UJn{TTn-l){fn) - (vfi"!l) (/«) 
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^(m) / \ / \ 



+ 



^(m) / \ / \ 

T^n-li.'^n/n-l) ' ^Tn-l (,7r.„/„_i j 

SO, as TTn-iiTfri/n-i) = have 

i)(/n)| 

< |((7r„_i - 7?^^"!^^) X 7f„)(7r.„/„_i • 



/n) • (^r; 

^(m) / N 
l^n/n- ij 



(m) 



+ 



Assumption Al implies that there exists Dn < oo such that 7r„/„„;^(x„_i) < 
Dn over En-i- Thus, we have using the induction hypothesis 



E 



(0) 
1 : n 



[|^n(??S)(/n)-o;„(vr„_i)(/„)r]i/^ 

(m) I ^n/n—l 



1 ; n 



< 



n-1 

2-DnC'l,n-lC'2,p 

(m+l)V2 



vr, 



n/n— 1 
Dr, 



p-i 1/p 



and 



E 



(0) 



ii + l) ^„(m + l)^/^ 

m=0 ^ ' 



< 



DnCin-lC; 



2,P 



(i + 1)1/2 • 



This concludes the proof. □ 
A.4. Convergence of marginals. 

Proof of Theorem 3.3. The proof is adapted from Proposition 4 
in [2]. For n = 1 the result follows directly from Assumption Al. Now 
consider the case where n > 2. We use the following decomposition for 
0<n{i) <i: 

|E ,0) [/.(X«)-^„(/0]| 



< IE 



(0) 



[/„,(x«)-K"(:) ,/„(x(-"»))]i 
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+ |E (0) M^tf^^fn) - U;n{7rn-l){fn)]\. 

^1 : n 

Assumption Al implies that 

For the first term, we use the following decomposition: 

n{i) 



and 



i=2 



:E ,0) [E ,0) [K^':.l,_,+,)(/„)(Xf 



i^^'-(,-„(/n)(xr^-+i))igr^]], 

n,7r, ' 



where 

i^''"k+i)(/n)(xr^'+^)) - i^^'-;,_,,(/„)(x(-^-+i)) 



n,7r. 



n-l 

J-2 



m=0 
m=0 

X (i^^'-i-Tr'(/n)(xr^+^)) -a;„(Ci))(/n)). 

Now we have from Proposition 2 that 



;(i-J+l) ^{i-j) I 



< 



2 1 

(1-pO^-J + 2 
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and using Assumption Al 



and 



< 



E 



(0) 
1 : n 



E (0 



(0) 
1 : n 



J — 



.m=0 



< 



{l-pn){i-j + 2) 



i-2 

m=0 



m-2 



|E ,0) [/„(X«) _„,.(/„)(Xr" 



(l-p„)(i-j + 2) l-pn 



< 



< 



n(i) 

2 1 



2 



■log 



[l-pnY ''\i-n{i) + l^ 

Finally, to study the last term E[a;„,(7f|j*_"^*''^ )(/„,) — i^n(vr„,-i)(/„,)], we use 
the same decomposition used in the proof of Theorem 3.1 to obtain 

iE[o.„(??j:rr^*)))(/„)-w„(7r„_i)(/„)]i 



< 2L'„E^(o) 



< 



1 ; n 

2-071^1,71-1 



Dn 



(i-n(i)) ( ^n/n-1 \ ( '^n/n-l 

<-l ( 1 - 



(i -n(i) + 1)1/2 ■ 

One can check that |E (o) [/„(X^*^) — Trnifn)]\ converges toward zero for 



nil] 



[i^^J where < a < 1. □ 
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